Automatically Constructing a Dictionary for Information Extraction Tasks
نویسنده
چکیده
Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-based NLP systems impractical for real-world applications because they cannot be easily scaled up or ported to new domains. In response to this problem, we developed a system called AutoSlog that automatically builds a domain-specific dictionary of concepts for extracting information from text. Using AutoSlog, we constructed a dictionary for the domain of terrorist event descriptions in only 5 person-hours. We then compared the AutoSlog dictionary with a hand-crafted dictionary that was built by two highly skilled graduate students and required approximately 1500 person-hours of effort. We evaluated the two dictionaries using two blind test sets of 100 texts each. Overall, the AutoSlog dictionary achieved 98% of the performance of the hand-crafted dictionary. On the first test set, the AutoSlog dictionary obtained 96.3% of the performance of the hand-crafted dictionary. On the second test set, the overall scores were virtually indistinguishable with the AutoSlog dictionary achieving 99.7% of the performance of the hand-
منابع مشابه
Using learned extraction patterns for text classification
A major knowledge-engineering bottleneck for information extraction systems is the process of constructing an appropriate dictionary of extraction patterns. AutoSlog is a dictionary construction system that has been shown to substantially reduce the time required for knowledge engineering by learning extraction patterns automatically. However, an open question was whether these extraction patte...
متن کاملA Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملAutomatic Construction of Knowledge Base from Biological Papers
We designed a system that acquires domain specific knowledge from human written biological papers, and we call this system IFBP (Information Finding from Biological Papers). IFBP is divided into three phases, Information Retrieval (IR), Information Extraction (IE) and Dictionary Construction (DC). We propose a query modification method using automatically constructed thesaurus for IR and a stat...
متن کاملConstructing a Dictionary of Biological Terms for Information Extraction
In information extraction (IE) systems a keyword dictionary which is a kind of knowledge base on domain speci c information is very important. We are developing technologies that construct a keyword dictionary for IE.
متن کاملExtracting Lexical-Semantic Knowledge from the Portuguese Wiktionary
Public domain collaborative resources like Wiktionary and Wikipedia have recently become attractive sources for information extraction. To use these resources in natural languague processing (NLP) tasks, efficient programmatic access to their contents is required. In this work, we have extracted semantic relations automatically from the Portuguese Wiktionary and compared our results with the re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993